ISOcat: Corralling Data Categories in the Wild
نویسندگان
چکیده
To achieve true interoperability for valuable linguistic resources different levels of variation need to be addressed. ISO Technical Committee 37, Terminology and other language and content resources, is developing a Data Category Registry. This registry will provide a reusable set of data categories. A new implementation, dubbed ISOcat, of the registry is currently under construction. This paper shortly describes the new data model for data categories that will be introduced in this implementation. It goes on with a sketch of the standardization process. Completed data categories can be reused by the community. This is done by either making a selection of data categories using the ISOcat web interface, or by other tools which interact with the ISOcat system using one of its various Application Programming Interfaces. Linguistic resources that use data categories from the registry should include persistent references, e.g. in the metadata or schemata of the resource, which point back to their origin. These data category references can then be used to determine if two or more resources share common semantics, thus providing a level of interoperability close to the source data and a promising layer for semantic alignment on higher levels.
منابع مشابه
Experiences with the ISOcat Data Category Registry
The ISOcat Data Category Registry has been a joint project of both ISO TC 37 and the European CLARIN infrastructure. In this paper the experiences of using ISOcat in CLARIN are described and evaluated. This evaluation clarifies the requirements of CLARIN with regard to a semantic registry to support its semantic interoperability needs. A simpler model based on concepts instead of data categorie...
متن کاملTowards standardized descriptions of linguistic features: ISOcat and procedures for using common data categories
Since 2009 the Max Planck Institute for Psycholinguistics in Nijmegen offers a web-based open source reference implementation of the ISO DCR (Data Category Registry, ISO 12620:2009), which is called ISOcat (“Data Category Registry for ISO TC 37”). ISOcat describes the data model and procedures for DCR. The talk presents the currently stage of the development and the status of ISOcat, and demons...
متن کاملLinking to Linguistic Data Categories in ISOcat
ISO Technical Committee 37, Terminology and other language and content resources, established an ISO 12620:2009 based Data Category Registry (DCR), called ISOcat (see http://www.isocat.org), to foster semantic interoperability of linguistic resources. However, this goal can only be met if the data categories are reused by a wide variety of linguistic resource types. A resource indicates its usa...
متن کاملISOcat Data Categories for Signed Language Resources
As the creation of signed language resources is gaining speed worldwide, the need for standards in this field becomes more acute. This paper discusses the state of the field of signed language resources, their metadata descriptions, and annotations that are typically made. It then describes the role that ISOcat may play in this process and how it can stimulate standardisation without imposing s...
متن کاملRELcat: a Relation Registry for ISOcat data categories
The ISOcat Data Category Registry contains basically a flat and easily extensible list of data category specifications. To foster reuse and standardization only very shallow relationships among data categories are stored in the registry. However, to assist crosswalks, possibly based on personal views, between various (application) domains and to overcome possible proliferation of data categorie...
متن کامل